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Abstract — Content Distribution Networks (CDNs) are overlay 
networks for placing the content near the end clients with the 
aim at reducing the delay, network congestion and balancing 
the workload, hence improving the service quality perceived 
by the end clients. The main objective of this work is to 
construct a semantic overlay network of surrogate servers 
based on equitable dominating set. This yields any replication 
algorithm that can replicate the contents to minimum number 
of surrogate servers within the SON. Such servers can be 
accessed from anywhere. Then we propose a content 
distribution algorithm named Optimal Fast Replica (O-FR) 
and apply our proposed algorithm to distribute the content 
over the Fquitable Dominating set based Semantic Overlay 
Networks (EDSON). We analyze the performance of our 
proposed Optimal Fast Replica (O-FR) in terms of average 
replication time, and maximum replication time and compare 
its performance with existing content distribution algorithms 
named Fast Replica and Resilient Fast Replica. The result of 
such approach improves the service quality perceived by the 
end clients. This paper also analyzes the use of equitable 
dominating set for the construction of semantic overlay 
networks and also investigates how it is useful for maintaining 
the uniform utilization of the surrogate servers. 

Index Terms — SON, Dominating set, CDN, DSON, Optimal 
Fast Replica. 

I. Introduction 

Content Delivery Networks (CDNs) have evolved to 
overcome the limitations of the Internet in terms of user 
perceived Quality of Service (QoS) when accessing web 
content. A CDN replicates content from the origin server to 
surrogate servers, scattered over the globe, in order to 
deliver content to end users in a reliable and timely manner 
from a nearby optimal surrogates. 

Apart from the pure networking issues of the CDNs 
relevant to the establishment of the infrastructure, some 
more issues such as selection of surrogate server for 
replication and retrieval, content replication policy, and 
caching.In this paper, we analyze the role of selection of 
surrogate server for optimum replication of content in the 
CDN, application of different content replication policies 
and their working mechanisms. 

This paper is organized as follows. The next section 
describes about the related work. Then, we present our 
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design of equitable dominating set based semantic overlay 
network and optimal fast replica content distribution 
algorithm in Section III and a discussion on the analytical 
study, experimental results and analyze the performance of 
different content distribution algorithms in Section IV. 
Finally, the conclusion and future work is presented in 
Section V. 

II. RELATED WoRK 

Content Delivery Networks provide services that 
improve network performance by maximizing bandwidth, 
improving accessibility, and maintaining correctness 
through content replication. They offer fast and reliable 
applications and services by distributing content to 
surrogate servers located close to users. 

In order to offload popular servers and improve end- 
user experience, copies of popular content are often stored 
in different locations. With mirror site replication, files 
from origin server are proactively replicated at surrogate 
servers with the objective to improve the user perceived 
Quality of Service (QoS). When a copy of the same file is 
replicated at multiple surrogate servers, choosing the server 
that provides the best response time is not trivial and the 
resulting performance can dramatically vary depending on 
the server selected [1]. 

Laurent Massoulie [2] proposed an algorithm called the 
localizer which reduces network load, which helps to 
evenly balance the number of neighbors of each node in 
overlay, sharing the load and improving the resilience to 
random node failures or disconnections. 

Rodriguez [3] and Biersack proposed a dynamic 
parallel-access scheme to access multiple mirror servers. In 
their study, a client downloads files from mirror servers 
residing in a wide area network. They showed that their 
dynamic parallel downloading scheme achieves significant 
downloading speedup with respect to a single server 
scheme. However, they studied only the scenario where 
one client uses parallel downloading. They failed to 
address the effect and consequences when all clients 
choose to adopt the same schemeColor figures will be L. 
Cherksova [4], and J. Kee proposed Fast Replica algorithm 
to distribute the content, in which a user downloads 
different parts of the same file from different servers in 
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parallel. Once all the parts of the file are received, the user 
reconstructs the original file by reassembling the different 
parts. 

Al-Mukaddim Khan Pathan and Rajkumar Buyya [5] 
presented a comprehensive taxonomy with a broad 
coverage of CDNs in terms of organizational structure, 
content distribution mechanisms, request redirection 
techniques, and performance measurement methodologies. 
They studied the existing CDNs in terms of their 
infrastructure, request-routing mechanisms, content 
replication techniques, load balancing, and cache 
management. Dominating sets have been used successfully 
in topology control in wireless Ad hoc networks [6, 7] and 
virtual back creation in sensor networks [8], 

ZhiHui Lu [9] et al proposed a novel content push 
policy, called TRRR i.e. Tree-Round-Robin-Replica which 
yields an efficient and reliable solution for distributing 
large files in the content delivery networks environment. 
They carried out some experiments to verify TRRR 
algorithm in small scale. They also demonstrated in 
experiment that TRRR significantly reduces the file 
distribution/replication time as compared with traditional 
policies such as sequential unicast and multiple unicast. 

Amutharaj. J and Radhakrishnan. S [10, 11] constructed 
a dominating set based overlay network to optimize the 
number of servers for replication. They investigated the use 
of Fast Replica algorithm to reduce the content transfer 
time for replicating the content within the semantic overlay 
network and compared its performance with sequential 
unicast, multiple unicast content distribution strategies in 
terms of content replication time and delivery ratio. 

Srinivas Shakkottai and, Ramesh Johari [12] evaluated 
the benefits of a hybrid system that combines peer-to-peer 
and a centralized client-server approach against each 
method acting alone. They investigated the relative 
performance of peer-to-peer and centralized client-server 
schemes, as well as a hybrid of the two — both from the 
point of view of consumers as well as the content 
distributor. 

Ye Xia [13] et al considered a two-tier content 
distribution system for distributing massive content and 
proposed popularity-based file replication techniques 
within the CDN using multiple hash functions. They 
developed a set of distributed, robust algorithms and 
evaluated the performance of proposed algorithms. 

Oznur Ozkasap [14], Mine Caglar and Ali Alagoz 
proposed and designed a peer-to- peer system; SeCond, 
addressing the distribution of large sized content to a large 
number of end systems in an efficient manner. It employed 
a self-organizing epidemic dissemination scheme for state 
propagation of available blocks and initiation of block 
transmissions. They showed that SeCond is a scalable and 
adaptive protocol which took the heterogeneity of the peers 
into account. 



III. DESIGN OF EQUITABLE DOMINATING SET 

BASED SON WITH OPTIMAL FAST REPLICA FOR 

CONTENT DISTRIBUTION 

A. EDSON Based Surrogate Server Selection 

Semantic Overlay Network 'G' can be defined as 
follows. 

G={V,E} (1) 

Where V = {Vi, V 2 , V 3 , .. V n } be the set of surrogate 
servers and E is the set of edges between i th surrogate 
server and j th surrogate server i.e. E= (V;, Vj) such that V; 
*Vj. 

Let D be the dominating set of G and DcG, the server 
not in D is adjacent to at least one surrogate server in D. 
Hence, all the surrogate servers are either member of D or 
V\D. 

Equitable Dominating set D is a set of 'r' dominating 

vertices in V since \D = r and V\D is the set of all the 

adjacent vertices of dominating server set D such that the 
difference between the degrees of all the vertices in D can 
differ utmost by 1. Each vertex v in D has more or less 
same number of neighbor nodes which are members of 
V\D. So contents are only replicated in the set of surrogate 
servers D which contains 'r' surrogate servers or less than 

'r' number of surrogate server's i.e. \D\ < V . 

B. Algorithm for Formation of Equitable Dominating Set 
based SON (EDSON): 

Step 1 :The algorithm begins by marking all the vertices 
of the graph white. 

Step 2:Algorithm selects the vertex with the maximal 
number of white neighbors. 

Step 3:The selected vertex is marked black and its 
neighbors are marked gray. 

Step 4:The algorithm then iteratively scans the gray 
nodes and their white neighbors, and selects the gray node 
or the pair of nodes (a gray node and one of its white 
neighbors), whichever has the maximal number of white 
neighbors. 

Step 5:The selected node or the selected pair of nodes is 
marked black, with their white neighbors marked gray. 

Step 6:Once all the vertices are marked gray or black, 
the algorithm terminates. All the black nodes form a 
connected dominating set (CDS). 

Step 7: After forming the CDS, check the degree of each 
vertices of the connected dominating set. 

Step 8: If the degree of any vertex vary more than one 
then mark that vertex gray and find the suitable alternate 
vertex as the member of the dominating set and mark it 
black. If no alternate node is found then leave as it is. 

C. Working Principles of Optimal Fast Replica in EDSON 

In order to offload popular servers and improve end- 
user experience, copies of popular content are often 
replicated in multiple surrogate servers which are scattered 
over geographically different locations based on some 
content distribution policy. In this paper, content 
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distribution policies such as sequential unicast, multiple 
unicast, Fast Replica(FR)[5,16,17,18], Resilient Fast 
Replica(R-FR)[5, 16,17,18], and Optimal Fast Replica(0- 
FR), are used to distribute the content from origin server to 
set of surrogate servers in the EDSON. 

The objective of Optimal Fast Replica (O-FR) is to 
minimize the maximum replication time. The working 
principle of Optimal Fast Replica can be described as 
follows. 

Step 1: Partition the Original file F into 'm' sub files of 
equal size. 

Size (F;) = Size (F)/ m bytes where l<=i<=m and m=n/2 
Step 2: Surrogate server N opens 'm' concurrent 
connections to surrogate servers Ni, N 2 ,. . ..N m . N will send 
each node N; the following file and information. 

• Surrogate Server list: R = {Ni, N 2 ... N m } (In next 
step, sub-file F; will be forwarded to this server 
list. 

• Sub-file Fj. 

• Replica amount: k (1<= k <=m). 

Step 3: Every surrogate server N;G {N l5 N 2 , ...N m }opens 
k-1 concurrent connections and replicate the sub file F; to 
the group with k-1 surrogate servers defined in the set { Nj, 
i<j<i+k, if j<m, then j=(j-l)mod m+1 } 

In this step, every server N {N|,N 2 ,...,N m } has the 
following output links and input links. 

• K-1 Output Links : forwarding sub-file Fi to node 
list { Nj, i<j<i+k, if j<m, thenj=(j-l) mod m+1 } 

• K-1 Input Links : receiving sub-file Fj from server 
list { Nj, i-k <j<i, if j<l, then j = j+m } 

Step 4 : At last, every node Nj holds k sub files, { Fj, i-k 
<j <=i, ifj<l,thenj=j+m} 

In general case, node list N {Ni, N 2 ,..., Nm}, as cache 
servers and supports concurrent download. 

Client Content Request Processing: 

When a user client requests file F from origin server 
that request will be redirected to the surrogate server list 
{Ni, N 2 ... N m }, and concurrently downloads every sub-file 
Fj. Then the sub-files will be reassembled in to original file 
F in the client machine. 

In the ideal case, when k=m, every surrogate server N; 
holds all of m sub-files of original file F and reorganizes 
them to form the Original file F in the local node. When the 
user requests file F from the origin server, the request will 
be redirected to one surrogate server in the list {Ni, N 2 ... 
Nm} and download the whole file F. 

IV. RESULTS AND DISCUSSIONS 

A. Analytical Study 

Let Time denote the transfer time of file F from the 
origin server N to surrogate server N; as measured at Nj. 
Two performance metrics: average and maximum 
replication times are considered. 

Average Replication Time: 



Maximum Replication Time: Time M ax reflects the time 
when all the surrogate servers in the overlay network 
receive k-sub files (l<=k<=m) of the original file. 
Time Max = max {Timei} where i =l...n 
In idealistic setting all the nodes and links are 
homogeneous, and let each node can support 'n' network 
connections to other nodes at B bytes/sec. Then, 

Time distribution = Size (F) / (nxB) (2) 

Time collection = Size (F) / (nxB) (3) 

B. Performance of Content Distribution Algorithms in an 
'n ' server Semantic Overlay Network 

Time taken for distributing the content over the 
Semantic Overlay Network by different content distribution 
algorithms are presented in Table I. 

TABLE I 

CONTENT DISTRIBUTION TIMES OF DIFFERENT CONTENT DISTRIBUTION 

ALGORITHMS 



Algorithm 


Content Distribution Time(T D ) 


Sequential Unicast 


n * Size (F) / B 


Multiple Unicast 


Size (F) / B 


Fast Replica 


2 x Size (F) / (nxB) 


Resilient Fast Replica 

without Node Failure 


2 x Size (F) / (n x B) 


Resilient Fast Replica with 
Failure of m' servers 


(2+m/n) * Size (F) / (nxB) 


Optimal Fast Replica 


(( k+n ) / n*n*k ) * Size (F) / B 



Replication Time proportion of different content 
distribution algorithms is tabulated in Table II. 

TABLE II 
REPLICATION TIME PROPORTION OF DIFFERENT CONTENT DISTRIBUTION 
ALGORITHMS 



Algorithm 


Replication Time Proportion 


Sequential Unicast 


n 


Multiple Unicast 


1 


Fast Replica 


2/n 


Resilient Fast Replica 
without Node Failure 


2/n 


Resilient Fast Replica with 
Failure of'm' servers 


(2+m/n)* 1/n 


Optimal Fast Replica 


(( k+n ) / n*n*k ) 



C. Performance of Content Distribution Algorithms in 
Equitable Dominating Set based Semantic Overlay 
Network 

Equitable Dominating set D is a set of 'r' dominating 
surrogate servers in surrogate server set V and V\D is the 
set of all the adjacent vertices of dominating node set D 
such that the difference between degree of all the vertices 
in D can differ utmost by l.Each vertex 'v' in V has more 
or less same number of neighbor nodes which are members 
of the adjacent servers set V\D. So contents are only 
replicated in the equitable dominated set of surrogate 
servers D instead of V. Suppose Cardinality of D is 'r' or a 
value less than 'r' then the contents will be replicated in 
utmost 'r' number of surrogate servers which is always less 

than V. i.e. ID! < |V| . 



Time 



Time, 



1/n* 



©2010 ACEEE 
DOL01.IJNS.01.03.24 



^VACEEE 



ACEEE Int. J. on Network Security, Vol. 01, No. 03, Dec 2010 



Therefore, Replication Time proportion of different 
content distribution algorithms such as sequential unicast, 
multiple unicast, Fast Replica (FR), Resilient Fast 
Replica(R-FR), and Optimal Fast Replica (O-FR) in 
EDSON can be expressed as follows: 

r : 1: 2/r : (2+m/r)*l/r : (( k+r ) / r*r*k ) where r < n. 

D. Simulation Experimental Study and Analysis: 

To evaluate the CDN, we used our complete simulation 
environment, called CDNsim [24], which simulates a main 
CDN infrastructure. It is based on OMNeT++ library which 
provides a discrete event simulation environment. 

All CDN networking issues, such as surrogate server 
selection, SON formation, replicating the content from 
origin server to surrogate servers, implementing the 
replication algorithms, propagation, and queuing are 
computed dynamically via CDNsim, which provides a 
detailed implementation of the TCP/IP protocol, 
implementing packet switching, packet transmission upon 
misses etc. 

Performance of different content distribution schemes in 
terms of Average Replication Time: 

We experimented with 12 different size files; 100 KB, 
750 KB, 1.5 MB, 3 MB, 4.5 MB, 6 MB, 7.5 MB, 9 MB, 36 
MB, 54 MB, 72 MB, 128 MB and 8 surrogate servers. Fig. 
1 shows the average replication time measured by different 
individual surrogate servers for different file sizes of 100 
KB, 750 KB, 1.5 MB, 3 MB, 4.5 MB, 6 MB, 7.5 MB, 9 
MB, 36 MB, 54 MB, 72 MB, 128 MB when 8 surrogate 
servers are in the replication set. High variability of 
average replication time under Multiple and Sequential 
Multicast is identified for larger file sizes. 

Average content replication time under Optimal Fast 
Replica algorithm across different file sizes in an 8 
surrogate servers replication set is much more stable and 
predictable. Hence, Optimal Fast Replica outperforms 
most of the cases than sequential unicast, multiple unicast, 
Fast replica, and Resilient Fast Replica(R-FR) content 
distribution schemes. 



Average Replication Time Analysis 




- Optimal Fasl Replica 



Fig. 1 .Average Content Replication Times for various schemes 



Performance of different content distribution schemes in 
terms of Maximum Replication Time 

We experimented with 12 different size files; 100 KB, 
750 KB, 1.5 MB, 3 MB, 4.5 MB, 6 MB, 7.5 MB, 9 MB, 36 
MB, 54 MB, 72 MB, 128 MB and 8 surrogate servers. 



Maximum Replication Time Analysis 
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Fig. 2. Maximum Content Replication Times for various schemes 

"Fig. 2," shows the maximum replication time 
measured by different, individual recipient nodes for 
different file sizes of 100 KB, 750 KB, 1.5 MB, 3 MB, 4.5 
MB, 6 MB, 7.5 MB, 9 MB, 36 MB, 54 MB, 72 MB,128 
MB when 8 surrogate servers are in the replication set. 

High variability of maximum replication time under 
Sequential Unicast and Multiple Unicast is identified. 
Maximum File replication time under Optimal Fast Replica 
(O-FR) algorithm across different file sizes in an 8 
surrogate servers replication set are much more stable and 
predictable. Hence, Optimal Fast Replica (O-FR) 
algorithm outperforms most of the cases than sequential 
unicast, multiple unicast, Fast replica, and Resilient Fast 
Replica(R-FR) content distribution schemes. 

Analysis on the impact of Equitable Dominating Set based 
SON 
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Fig. 3. Impact of Equitable Dominating Set in SON based CDN 
Formation 

By the implementation of equitable dominating set for 
the clustering of surrogate servers in the SON, the average 
number of surrogate servers for content replication is 
reduced to 60 percentages or less. Although the number of 
surrogate servers is reduced, there will not be any change 
in the redundancy because of the proposed Optimal Fast 
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Replica(O-FR) content distribution algorithm used for 
distributing the content to different surrogate servers and 
collect the content from the replica servers and reconstruct 
them locally. 

Performance of different content distribution schemes in 
Equitable Dominating Set based SON in terms of Average 
Replication Time: 



Overlay Network of surrogate servers. This is depicted in 
Fig.5. 
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Fig. 4. Performance of different Content Distribution Algorithms in 
EDSON 

"Fig. 4," shows the average replication time measured 
by different, individual recipient nodes for different file 
sizes of 100 KB, 750 KB, 1.5 MB, 3 MB, 4.5 MB, 6 MB, 
7.5 MB, 9 MB, 36 MB, 54 MB, 72 MB,128 MB when the 
file is replicated in dominated replication set of surrogate 
servers. We measured the average replication time of 
different content distribution algorithms such as Optimal 
Fast Replica(O-FR), Resilient Fast Replica(R-FR) and Fast 
Replica(FR) across different file sizes in both traditional 
SON based CDN as well as DSON based CDN of surrogate 
servers. 

We observed that average replication time of all the 
three content distribution algorithms such as Optimal Fast 
Replica (O-FR), Resilient Fast Replica(R-FR), and Fast 
Replica is reduced due to the use of equitable dominating 
set for reducing the number of surrogate servers in which 
replication of content carry out. 

Role of Equitable Dominating Set and surrogate server 
utilization: 

We evaluate the performance of CDN in terms of Net 
Utility (Uj ) which can be given by the formula. 

Ui = 2 /n * arctan (a) (4) 

a - ratio between uploaded bytes to downloaded bytes. 
The resulting utility value ranges to [0..1]. 
The value Uj can be 

Uj = 1 if the surrogate server uploads only content 

U = if the surrogate server downloads only content 

Uj = 0.5 if upload and downloads are equal 

We investigated the use of different overlay 

construction methodologies such as Semantic Overlay 

Network (SON), Dominating set based SON (DSON), and 

Equitable Dominating Set based SON (EDSON). It is 

observed that Net Utility Uj of individual surrogate servers 

is uniform in the Equitable Dominating Set based Semantic 



Construction of SON Vs Net Utility 




S3 S4 S5 

Surrogate Server Number 



- Net Utility of ith Server in SON 

- Net Utility of ith Server in EDS based SON 



- Net Utility of ith Server in DS based SON 



Fig. 5. Construction of SON Vs Net Utility 

V. CONCLUSION AND FUTURE WORK 

In this work, first we constructed equitable dominating 
set based semantic overlay network (EDSON) of surrogate 
servers for replicating the content from the origin server to 
a set of surrogate servers with the aim at placing the 
content nearer to the end user. 

We have conducted simulation experiments using 
CDNsim and analyzed the performance of content 
distribution algorithms in terms of average content 
replication time and maximum content replication time for 
large files over SON. It is found that Optimal Fast Replica 
(O-FR) algorithm outperforms other content distribution 
algorithms. 

We have performed both analytical study and empirical 
study for analyzing the performance of the content 
distribution algorithms. 

We also investigated the effect of equitable dominating 
set in SON formation and how it was useful in reducing the 
redundancy. It is also observed that equitable dominating 
set based SON is useful in keeping the average replication 
time stable and much more predictable even though the 
content distribution algorithms differs We also investigated 
that how equitable dominating set based semantic overlay 
network is useful in maintaining the net utilization of 
individual surrogate servers much more stable and balance 
the load of individual surrogate servers. 
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